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1.  Introduction 

The  literature  on  change  point  problems  is,  by  now,  enormous.  Here  we 
consider  only  the  so-called  nonsequential  or  fixed  sample  size  version  although  an 
informal  sequential  procedure  which  follows  from  Smith  (1975)  is  a  routine 
consequence  (see  Section  6).  Still  the  literature  is  substantial  and  we  merely  note 
several  review  articles  which  span  both  parametric  and  nonparametric  approaches. 
These  are  Hinkley,  Chapman  and  Runger  (1980),  Siegmund  (1986),  Wolfe  and 
Schechtman  (1984)  and  Zacks  (1983). 

Our  focus  is  on  a  fully  Bayesian  parametric  approach.  Use  of  the  Bayesian 
framework  for  inference  with  regard  to  the  change  point  dates  to  work  by  Chernoff 
and  Zacks  (1964)  and  Shiryayev  (1963).  Smith  (1975)  presents  the  Bayesian 
formulation  in  the  case  of  a  finite  sequence  of  independent  observations.  In 
particular  he  addresses  three  situations:  (i)  both  the  initial  distribution  and  the 
changed  distribution  are  known,  (ii)  only  the  former  is  known  (iii)  both  are 
unknown.  Details  are  given  for  binomial  and  normal  models.  Broemeling  (1972) 
and  Menzefricke  (1981)  also  consider  normal  models.  Bacon  and  Watts  (1971), 
Ferreira  (1975),  Holbert  and  Broemeling  (1977),  Chin  Choy  and  Broemeling  (1980), 
Smith  and  Cook  (1980),  and  Moen,  Salazar  and  Broemeling  (1985)  look  at  change 
points  in  linear  models.  Booth  and  Smith  (1982)  and,  in  a  slightly  different  fashion, 
West  and  Harrison  (1986)  consider  time  series  models.  Diaz  (1982)  and  Hsu  (1982) 
study  sequences  of  gamma-type  random  variables.  Raftery  and  Akman  (1986) 
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point  and  of  ihe  model  parameters.  In  many  of  the  above  papers  unsatisfying 
compromises  have  been  made  with  regard  to  model  specification  and/or  assumed 
knowledge  of  at  least  some  of  the  model  parameters  in  order  to  reduce  the 
dimensionality  of  numerical  integrations  required  to  obtain  marginal  posterior 
distributions.  The  objective  of  this  paper  is  to  demonstrate  that,  for  a  very  broad 
range  of  hierarchical  change  point  models,  such  compromise  is  not  necessary. 
Rather,  all  desired  marginal  posterior  densities  can  be  obtained  through  a 
straightforward  iterative  Monte  Carlo  method.  Needed  random  generation,  whether 
or  not  conjugacy  is  assumed,  can  typically  be  carried  out  in  a  reasonably  efficient 
manner.  We  concede  that  for  any  particular  situation  there  may  exist  more  efficient 
methods  for  obtaining  desired  marginal  posteriors.  By  the  same  token,  we  can 
handle  many  situations  which  were  previously  inaccessible.  Moreover  the 
conceptual  simplicity  of  our  method  may  prove  an  attractive  alternative  to  the 
analytic  and/or  numerical  sophistication  demanded  by  other  methods.  The 
subsequent  development  advances  work  reported  in  Gelfand  and  Smith  (1990)  and 
Gelfand  et.  al.  (1990). 

In  particular  in  Section  2  we  formulate  the  hierarchical  Bayes  change  point 
model.  We  clarify  what  distributions  are  sought  and  what  distributions  can  be 
readily  sampled.  In  Section  3  we  briefly  review  the  Gibbs  sampler  which  underlies 
our  iterative  Monte  Carlo  method.  We  also  show  how  the  distribution  of  and 
expectations  of  arbitrary  functions  of  the  model  parameters  can  be  obtained.  In 
Section  4  through  7  we  present  a  range  of  examples.  As  an  elementary  illustration 
in  Section  4  we  apply  our  methodology  to  the  "Nile  data"  previously  analyzed  under 
simplifying  assumptions  by  Cobb  (1978)  and  Hinkley  and  Schechtman  (1987).  In 
Section  5  we  examine  the  British  coal  mining  accident  data  of  Maguire,  Pearson  and 
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Wynn  (1952)  as  extended  and  corrected  by  Jarrett  (1979).  This  data  has  been 
discussed  in  Worsley  (1986),  Raftery  and  Akman  (1986)  and  Siegmund  (1988).  Our 
approach  routinely  handles  the  complete  data  set  as  well  as  a  reduced  version  where 
20%  is  treated  as  missing.  In  Section  6  we  show  that  independent  observations  are 
not  required  to  implement  our  methodology  as  we  consider  a  change  in  transition 
matrix  for  a  sequence  of  observations  from  a  Markov  chain.  As  a  final  example  in 
Section  7  we  study  the  changing  regressions  model.  Illustrating  in  the  context  of 
simple  linear  regression  we  add  a  further  wrinkle  by  allowing  for  known  order 
between  the  original  and  changed  slope.  We  look  at  two  data  sets  —  a  real  one 
investigated  by  Bacon  and  Watts  (1971)  and  a  generated  one.  We  offer  some 
concluding  remarks  in  Section  8. 

2.  The  Bayesian  Formulation 

The  simplest  formulation  of  the  change  point  problem  assumes  f  and  g  known 
densities  with  Yj  ~  f(y),  i=l, -  •  -k,  Yi  ~  g(Y),  i=k+l,-  •  -n,  with  k  unknown  taking 
values  in  JCn  =  {1,2,*  ••n}.  Thus  k=n  is  interpreted  as  "no  change".  As  a  result 
with  Y=(Yi,-  •  -Yn)  the  likelihood  L(Y;k)  becomes 

L(Y;k)  =.n  f(Yj)  •  i  g(Y0  (1) 

1=1  i=k+l 

from  which,  for  instance,  the  MLE  for  k  can  be  directly  obtained.  Distribution 
theory  for  the  MLE  is  well  discussed  in  the  literature  (see  e.g.,  Hinkley,  1970).  The 
Bayesian  perspective  is  added  by  placing  a  prior  density  r(k)  on  Ka  whence  the 
posterior  density  of  k  |  Y  becomes 

L(Y;k)r(k)  .  (2) 

/  £  L(Y;k)r(k) 

k  =  l 

Features  of  (2)  e.g.,  mode,  quantiles,  expectations  are  easy  to  obtain.  The 
question  of  whether  or  not  a  change  has  occurred  is  addressed  through  the  posterior 
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odds  for  no  change,  P(k  =  n|Y)/{l  -  P(k  =  n|Y)}.  Since  the  model  assumes  at 
most  one  change,  to  make  inference  about  where  it  occurred  we  might  use  the  mode 
and  a  highest  posterior  density  credible  interval  of  (2).  It  is  easy  to  sample  from  (2) 
since  it  is  a  discrete  distribution  on  ICa.  Note  also  that  the  independence  of  the  Yi's 
is  not  necessary  provided  that  for  each  k  we  can  write  down  the  likelihood  L(Y;k). 
Such  a  situation  arises  in  the  Markov  chain  example  of  Section  6  and  also  for 
instance  in  changing  time  series  applications  and  more  genera)  dynamic  models. 
Nonetheless  to  simplify  notation  and  presentation,  for  the  remainder  of  this  section 
we  assume  that  the  Yi  are  independent. 

Suppose,  as  is  more  realistic,  that  the  original  and  changed  densities  are 
unknown  but  that  we  assume  a  parametric  family  f(Y|0)  for  the  former  and  a 
(possibly  different)  parametric  family  g(Yjj?)  for  the  latter.  Then  the  likelihood, 
L(Y;k,0,J7)  becomes 

.nf(Yi|*).  n  g(Yi|jj). 

1  =  1  i=k*l 

Assuming  a  prior  Tr[0,Tj,k)  on  0, tj  and  k  the  joint  distribution  of  data  and  parameters 
is 

L(Y ;k,0,i7)  ■  *<M,k)  (3) 

Interest  centers  on  the  marginal  posterior  distribution  of  k  as  well  as  that  of 
components  of  0  and  of  tj. 

For  the  moment  take  0  and  tj  to  be  scalar.  Suppose  we  seek  e.g.  the 
distribution  k|Y.  To  obtain  this  modulo  normalizing  constant  requires  a  double 
integration  over  9  and  tj;  to  obtain  it  exactly  requires  an  additional  triple 
integration  over  9 ,  tj ,  and  k.  Such  calculation  will  typically  require  numerical  or 
analytic  expertise  and,  of  course,  if  0  and  tj  are  vectors  the  situation  worsens. 
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Consider  however  the  "full”  conditional  distributions,  k|Y,0,J7,  9\Y,k,ri  and 
rj|Y,k,0.  Treating  the  data  Y  as  fixed,  suppose  that  these  full  conditional 
distributions  uniquely  determine  the  joint  (conditional)  distribution  k,0,J?|Y,  hence 
the  marginal  posteriors  which  we  seek.  Then  the  development  in  Section  3  shows 
that  suitable  sampling  from  the  full  conditional  distributions  enables  marginal 
posterior  density  estimates.  We  now  demonstrate  that  in  the  present  case  sampling 
from  the  full  conditional  distributions  will,  rather  generally,  be  straightforward. 

First  of  all,  given  9  and  tj,  k|  Y ,9,t)  is  exactly  of  the  form  (2)  and  is,  as  noted 
above,  straightforwardly  sampled.  Suppose  and  k  are  assumed  independent  so 
that  x(0,»7,k)  =  A(0)7(j7)r(k).  If  A  is  conjugate  with  f  and  7  is  conjugate  with  g  then 
0\Y,T),k  does  not  depend  upon  T]  and  is  merely  the  prior  A  updated  by  the  data 
Yr  •  'Yk  while  rj\Y,9,k  is  7  updated  by  the  data  Yt+i,-  •  -Yn.  Since  A  and  7  are 
thus  standard  parametric  families,  sampling  from  these  full  conditionals  is  routine. 
Conjugate  priors  can  be  made  arbitrarily  diffuse  and  have  an  attractive  robustness 
property  (see  Morris  1983,  p.  525). 

However,  we  may  wish  to  investigate  nonconjugate  possibly  heavier— tailed 
priors.  Suppose  we  drop  the  assumptions  of  conjugacy  for  9  and  for  77  and  also  the 
assumption  of  independence  for  9,  77  and  k.  Nonetheless  it  is  still  the  case  that,  for 
any  scalar  parameter,  its  associated  full  conditional  distribution  is  proportional  to 
L(Y;k,0,77).  But  then  random  generation  methods  such  as  the  ratio  of  uniforms 
method  or  perhaps  the  rejection  method  (see  Devroye,  1986  or  Ripley,  1986)  enable 
sampling  from  such  a  nonstandardized  integrable  density  function.  In  the  present 
paper  we  confine  ourselves  to  conjugate  examples.  Our  experience  with 
nonconjugate  models  and  the  ratio  of  uniforms  method  will  be  reported  in  a  future 
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Extension  to  general  hierarchical  Bayes  models  is  immediate.  Suppose,  for 
instance,  that  again  9  and  t)  are  independent  of  each  other  and  of  k.  Also  suppose 
that  the  prior  on  9  takes  the  form  A(0|a)  with  conjugate  prior  p(a)  on  the 
hyperparameter  a,  the  prior  on  ij  takes  the  form  'y{Tl\fl)  with  conjugate  prior  <p((J) 
on  the  hyperparameter  0.  Now  the  joint  distribution  of  the  data  and  all  parameters 
is 

L(Y;M,?)  •  r(k)  ■  A(«|«)  •  p(a)  ■  -K,|0  ■  M  (4) 

Once  again  the  full  conditionals  are  easy  to  write  down  and  to  sample  from.  In 
particular: 

k  |  Y ,9,rf,a,0  does  not  depend  upon  a  or  0  and  is  exactly  of  the  form  (2), 

9\  Y,k ,r/,a,/?  does  not  depend  upon  rj  or  0  and  is  the  prior  A  updated  by  the  data 

Yi,---Yk> 

rf\  Y,k,0,a,/7  does  not  depend  upon  9  or  a  and  is  the  prior  7  updated  by  the  data 
Yk.i,-*-Y., 

Q-|  Y,k, 9,71,0  does  not  depend  upon  Y,k,V  °r  P  aQd  is  the  hyperprior  p  updated 

by  9 , 

0\  Y,k,^,a,»/  does  not  depend  upon  Y,k,0or  a  and  is  the  hyperprior  o  updated 
by  J7. 

Again  if  the  assumed  conjugacy  and  parameter  independence  are  relaxed  the 
aforementioned  random  generation  methods  can  still  be  employed. 

3.  Review  of  the  Gibbs  Sampler 

In  the  previous  section  we  demonstrated  that,  for  rather  general  hierarchical 
Bayes  change  point  models,  sampling  from  each  of  the  full  conditional  distributions 
can  be  accomplished.  Under  mild  conditions  (see  Besag,  1974)  the  specification  of 
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ail  full  conditional  densities  uniquely  determines  the  full  joint  density  hence  ail 
marginal  densities.  The  Gibbs  sampler,  introduced  formally  in  Geman  and  Geman 
(1984)  and  subsequently  developed  in  detail  for  general  Bayesian  computations  in 
Gelfand  &  Smith  (1990)  and  Gelfand  et.  al.  (1990),  provides  a  mechanism  for 
extracting  marginal  distribution  from  the  full  conditional  distributions.  Tanner  and 
Wong  (1987)  develop  the  notion  of  substitution  sampling  which  is  closely  related  as 
discussed  in  Gelfand  and  Smith  (1990). 

For  the  remainder  of  this  section  densities  will  be  denoted  generically  by 
square  brackets  so  that  joint,  conditional  and  marginal  forms  appear  respectively  as 
[U,V]  [U|V]  and  [V].  The  usual  marginalization  by  integration  is  denoted  by  forms 
such  as  [U]  =  /[U |  V]  •  [V].  For  a  collection  of  random  variables  Uj,U2,-  •  -Upthe 
full  conditional  densities  are  thus  denoted  by  [Us|Ur,  r  i  s],  s  =  1,2,-  •  •  p  and  we 
seek  the  marginal  densities  [Us],  s  =  1,2,  *  *  -  p. 

Gibbs  sampling  is  a  Markovian  updating  scheme  which  proceeds  as  follows. 
Given  an  arbitrary  starting  set  of  values  Ul®,- •  •.Up®,  we  draw  U(,n  from 
[U,  |ui0),--.,Up®],  then  from  (U2|Uil\  uj®,- •  -.Up05]- •  •  and  so  on  up  to 
Up1’  from  (Up^lUj15- • -,Up!.iJ  to  complete  one  iteration  of  the  scheme.  After  t 
such  iterations  we  would  arrive  at  ((U,1t),*  •  •  ,UptJ).  Geman  and  Geman  show  under 
mild  conditions  that  (Ujl),-  •  -,Upl))  —  (U,  ,*  •  * ,Up)  ~  [U,  ,U2,-  •  as  t  — »  ®. 
Hence  for  t  large  enough,  Ujl)  for  example  will  be  regarded  as  a  simulated 
observation  from  [Us].  Parallel  replication  of  this  process  m  times  yields  m  iid 
p-tuples  (Ujj1,** '.Up-’)  j  =  l,-",m.  Note  that  sample  size  at  say  the  /-th 
iteration  may  be  increased  from  m  to  any  specified  size  by  randomly  reusing  the 

^  p  ) 

Usj  with  replacement. 


A  kernel  density  estimate  for  [UJ  based  upon  the  U^1  can  be  readily  obtained 
(see  e.g.,  Silverman,  1986)  and  should  be  adequate  if,  at  the  last  iteration  the 
number  of  replications,  m.  is  large  enough.  However  using  a  Rao-Blackwell 
argument  (see  Gelfand  and  Smith,  1990)  a  density  estimate  of  the  form 

[Ud-ElU.IU'^.r^l/m  (5) 

is  better  under  a  wide  range  of  loss  functions.  This  is  not  surprising  since  (5)  takes 
advantage  of  the  known  structure  in  the  model  whereas  the  kernel  density  estimate 
does  not.  The  form  (5)  is  a  discrete  mixture  distribution  and  is  in  fact  a  Monte 
Carlo  integration  to  accomplish  the  desired  marginalization. 

In  our  context  the  U,  are  the  unknown  parameters  in  the  hierarchical  change 
point  model  whence  for  instance  in  the  situation  (4)  the  density  estimate  for  k|Y 

becomes  mM.EJk|  Y,djt),r;jt)]  where  [k|Y,0,r?]  is  of  the  form  (2). 

There  may  also  be  interest  in  a  function  of  the  parameters  (see  Section  5),  i.e. 
a  function  of  the  Uj,  say  W(Ui,-  •  -UP).  Each  p-tuple,  (Ujj’,-  •  *  ,Upj 5),  provides  an 
observed  Wjl)  =  W(u|j),- •  •,Up|))  whose  marginal  distribution  is  approximately 
[W]  whence  a  kernel  density  estimate  for  [W]  using  these  W-l)can  be  developed.  A 
"Rao-BIackweilized"  density  estimate  analogous  to  (5)  can  also  be  obtained.  If  Us 
actually  appears  as  an  argument  of  W  the  full  conditional  density  [W|Ur,  r  #  s]  can 
be  obtained  by  univariate  transformation  from  (Us|Ur,  r  t  s]. 

In  concluding  this  section  we  note  that  complete  implementation  of  the  Gibbs 
sampler  requires  that  a  determination  of  t  be  made  and  that  across  iterations. 
choice(s)  of  m  be  specified.  In  a  challenging  application  some  experimentation  with 
different  settings  for  t  and  m  will  likely  be  necessary.  We  do  not  view  this  as  a 
deterrent  since  random  generation  is  generally  inexpensive  and  since  there  may  be 
no  feasible  alternative.  In  the  subsequent  examples  convergence  was  evaluated  in  a 
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univariate  manner  by  plotting  marginal  posterior  density  estimates  of  the  form  (5) 
five  iterations  apart  to  see  if  they  are  equivalent  under  a  "thick  felt  tip  pen"  test. 
Typically  a  somewhat  small  m  is  used  until  convergence  is  concluded  at  which  point 
for  a  final  iteration  m  is  increased  by  an  order  of  magnitude  to  develop  the 
presented  density  estimate.  We  make  no  claims  for  this  procedure.  Assessment  of 
convergence  is  a  complex  issue  which  we  are  currently  investigating. 

4.  Normal  Means  Problem 

As  an  elementary  illustration  we  assume  data  from  a  normal  distribution 

whose  mean  may  have  changed  at  some  point  during  the  period  of  observation. 

That  is,  Yj  ~N(0,,a2)  i  =  l,---k,  Y-,  ~  i=k+l,  "‘n-  At  the  second 

stage  we  adopt  the  following  prior  distributions:  k  discrete  uniform  on  £n;  du  62  ~ 

N(/i,r2);  <j\,  o\  ~  IG(a,b)  where  IG  denotes  the  inverse  gamma  distribution.  At  the 
third  stage  of  the  hierarchy,  let  p  ~  N(/io,^o)  ^  r2  ~  IG(c,d)  where  /10,  ^o>  a’  c’ 
and  d  are  known  constants.  This  hierarchical  structure  is  standard  in  the  Bayesian 
literature  and  has  already  been  used  with  the  Gibbs  sampler  for  comparing  p 
independent  normal  means  in  Gelfand  and  Smith  (1990).  A  simplified  version  for 
the  change  point  proolem  was  discussed  in  Smith  (1975).  Our  primary  interest 
focuses  on  the  marginal  posterior  distributions  of  k,  and  02.  AH  told  we  have  a  7 
parameter  problem.  The  7  corresponding  full  conditional  distributions  are: 

|>2  +  r2  S  Yi 

0il  Y,02, (i,r2,k  ~  N  - - - - - 

[  cr2,  +kr 


2  2 
°\T 

^2i+kr2. 
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The  Gibbs  sampler  is  thus  easily  implemented,  the  only  slight  inconvenience  being 
that  at  each  iteration  the  discrete  full  conditional  for  k  must  be  restandardized, 
hence  reevaluated  at  each  of  the  n  support  points. 

As  an  example  we  analyze  the  data  set  given  in  Table  1  which  records  annual 
volume  of  discharge  from  the  Nile  river  at  Aswan  for  the  years  1871  to  1970.  This 
data  has  been  studied  by  Cobb  (1978)  and  by  Hinkley  and  Schechtman  (1987). 
Both  provide  a  conditional  frequentist  approach  to  estimating  k  but  unrealistically 
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assume  that  all  the  parameters  except  k  are  known  using  data-based  values  n\  = 
11.0,  /12  =  8.5,  and  <7i  =  <72  =  1-25. 

(Insert  Table  1  here) 

We  chose  relatively  vague  priors  for  our  variances  by  taking  a=b  =  c  =  d  = 
2,  and  chose  the  slightly  more  informative  values  of  po  =  10  and  ao  —  2  for  the 
hyperprior  on  \i.  Figure  1  presents  the  marginal  posterior  density  estimate  for  the 
change  point,  Figure  2  for  the  mean  level  before  and  after  the  change.  Convergence 
was  achieved  within  30  iterations  and  the  densities  were  drawn  using  the  31st  with 
m  =  100. 

Clearly  there  is  strong  evidence  that  the  change  occurred  at  k  =  28,  or 
following  the  year  1898.  Our  estimate  of  the  posterior  probability  of  this  event,  .77, 
is  slightly  less  than  the  Bayes  estimate  of  .81  that  Cobb  obtained  using  his 
simplified  model,  but  more  than  .64,  the  unconditional  (frequentist)  probability  that 
the  maximum  likelihood  estimator  of  k  equals  28  using  the  asymptotic  sampling 
distribution  given  in  Hinkley  (1970).  The  posterior  modal  values  of  10.88  and  8.55 
for  0\  and  $2  respectively,  are  close  to  the  data— based  values  above.  The  marginal 
posterior  for  6\  is  more  spread  than  that  for  82  since  the  change  seems  to  have 
occurred  early  in  the  series. 

5.  A  Poisson  Process  With  Change  Point 

As  a  second  illustration  consider  Bayesian  analysis  of  a  Poisson  process  with  a 
change  point.  Such  an  analysis  was  recently  given  by  Raftery  and  Akman  (1986). 
Assuming  conjugate  priors  they  examine  the  collection  of  times  between  occurrences 
of  the  process  allowing  the  time  of  change  to  be  continuous.  For  chronologically 
ordered  time  intervals  not  necessarily  of  equal  length  we  study  the  set  of  occurrence 
counts.  We  use  a  three  stage  hierarchical  model  but  assume  that  the  change  point 
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occurs  between  intervals.  Thus  our  model  for  the  data  is  Yj  ~  Po(#ti),  i  =  1,-  •  -k, 
Yj  ~  Po(Atj),  i  =  k+1,-  •  -n.  At  the  second  stage  we  place  independent  priors  over 
k,  9  and  A  :  k  discrete  uniform  Kn,  9  ~  G(ai,bi),  and  A  ~  G(a2,b2)  where  G  denotes 
the  gamma  distribution.  At  the  third  stage  we  take  bi  ~  IG(ci,di)  independent  of 
b2  ~  IG(c2,d2)  and  assume  that  ai,  a2,  Ci,  C2,  di,  and  d2  are  known.  Our  interest 
again  lies  in  the  marginal  posterior  distributions  of  k,  9  and  A.  The  collection  of  full 
conditional  distributions  are  easily  available  as: 

9\  Y,A,bt,b2,k  ~  G  ai  +  S  Yi,  E  tj  +  b',1 
A  |  Y,0,bi,b2,k  ~  G  a2  +  E  Yi(  E  ti  +  b,1! 

k+l  lk+i 

r  -li"1 

bi  j  Y,^,A,b2,k  ~  IG  ai  +  Ci,  0  +  d, 

r  .,1*1 

b2 1  Y,0,A,bi,k  ~  IG  a2  +  C2,  A  +  d2 

. 

and 

p(k|Y,0,A,bi,b2)  =  L(Y;k,0,A)/£  L(Y;k,0,A) 

k 

k 

k  EYi 

where  L(Y;k,0,A)  =  exp{(A-0)E  tj}  •  {9/ A)1 

A  variable  of  interest  might  be  the  ratio  R  =  0/A.  Following  the  discussion 
near  the  end  of  Section  3  we  transform  from  9  to  R  to  obtain  as  full  conditional 
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distribution  R|  Y,A,bi,b2,k  ~  G(ai  +  E  Y,,  A  (E  ti  +  bt  )*  ).  A  density  estimate  as 
in  (5)  can  now  be  obtained. 

The  Gibbs  sampler,  in  general,  straightforwardly  handles  missing  data.  Such 
points  can  be  treated  as  additional  model  parameters  whose  full  conditional 
distributions  are  immediate.  However  if  the  predictive  distributions  for  such  points 
are  not  of  interest  we  would  typically  not  perform  the  additional  random  generation 
required  to  incorporate  them  as  model  parameters.  Rather,  we  would  prefer  to  use 
the  likelihood  based  solely  upon  the  observed  data.  In  the  present  case,  for 
example,  we  need  only  set  the  associated  Y  and  t  for  missing  points  to  0  in  the 
likelihood  and  proceed  as  above. 

A  much— analyzed  data  set  of  intervals  between  British  coal— mining  disasters 
during  the  112  year  period  1851-1962  was  gathered  by  Maguire  et.  al.  (1952), 
extended  and  corrected  by  Jarrett  (1979).  Frequentist  change  point  investigations 
appear  in  Worsley  (1986)  and  in  Siegmund  (1988)  while  Raftery  and  Akman  (19S6) 
apply  their  Bayesian  model.  We  apply  ours  using  yearly  intervals.  The  resulting 
observed  annual  counts  appear  in  Table  2;  k  e{  1 ,2, *  •  *,112}. 

(Insert  Table  2  here) 

Raftery  and  Akman  used  a  vague  second  stage  prior  by  taking  ai  =  a2  =  .5 
and  bi  =  b2  =  0.  We  make  our  model  comparable  by  taking  their  values  for  at  and 
a2  and  choosing  vague  third  stage  priors  for  bi  and  b2,  letting  ci  =  C2  =  0  and  di  = 
d2  =  1.  Convergence  of  the  algorithm  was  obtained  after  15  iterations  and  again  m 
=  100.  Figure  3  shows  the  density  estimates  for  k|  Y  using  the  entire  data  set  (solid 
lines)  and  using  the  data  set  with  every  fifth  year  deleted  (dashed  lines).  Note  that 
k  =  41  is  the  posterior  mode  in  both  cases,  and  that  the  three  largest  spikes  are  k  = 
39,  40  and  41,  meaning  the  change  most  probably  occured  sometime  between  late 
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1889  and  early  1892.  Raftery  and  Akman  obtained  similar  results  -  a  posterior 
mode  of  March  10,  1890  and  a  posterior  median  of  August  27,  1890.  Also  note  that 
in  the  missing  data  case,  with  no  data  at  k  =  5j  the  density  estimates  are  the  same 
for  k  =  5j  and  k  =  5j-l,  j  =  1,-  •  -,22. 

Figure  4  displays  the  density  estimates  for  8\Y  (solid  lines)  and  for  A|Y 
(dashed  lines)  for  the  original  and  reduced  data  sets,  and  Figure  5  does  the  same  for 
R|  Y.  For  all  three  pairs  of  curves,  as  expected  the  reduced  data  set  has  the  greater 
spread.  We  see  that  systematically  deleting  20%  of  the  data  has  no  effect  on  the 
posterior  modes  of  8  and  A  (3.06  and  0.89,  respectively),  and  shrinks  the  posterior 
mode  of  R  only  slightly  from  3.25  to  3.18.  Raftery  and  Akman  obtained  3.41  as  the 
posterior  mean  of  R  using  their  model.  As  a  final  remark,  we  note  that,  as  in  the 
last  example,  the  posterior  probability  that  k  =  n  is  essentially  0,  indicating  very 
strong  evidence  for  a  change. 

6.  Markov  Chain  Change  Points 

To  our  knowledge  there  is  no  previous  literature  examining  the  Markov  chain 
change  point  problem  from  a  Bayesian  viewpoint.  Nonetheless  the  problem  is 
extremely  important  arising  for  instance  in  the  analysis  of  spatial  variations  in  base 
frequencies  in  DNA  (Curnow  &  Kirkwood  1989  sec.  7)  as  well  as  in  system  user 
authentication  contexts.  Suppose  then  a  sequence  of  a  sequence  of  n  observations  Y 
=  (Y|,*  •  •  ,Yn)  from  a  process  which  is  an  p-state  stationary  Markov  chain  having 
either  transition  matrix  A  or  precisely  one  change  to  a  transition  matrix  B.  The 
entries  of  A  are  ajj  =  P(Yt*i  =  j  |  Yt  =  i)  whence  ay  >  0,  Say  =  1;  similarly  for  B 
with  entries  by.  We  take  independent  Dirichlet  priors  on  the  rows  of  A  yielding  the 

p 

prior  A(A)  =  II  D  *  (ai)  where 

i*l 
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T(i  Ail) 

Dx^mV^—k^ 

jnir(Aij) 

p 

We  do  similarly  for  B  taking  7(B)  =  II  D^fbi).  A  and  7  offer  a  rich  class  of  priors. 

i=i 

In  particular,  special  structure  for  A  and  B  can  be  modeled  through  appropriate 
choices  of  the  Ai  and  7i.  As  before,  we  denote  the  prior  on  k  by  r(k).  Note  that  the 
multinomal  change  point  problem  occurs  as  a  special  case  when  the  Yi  are 
independent  and  a,  =  a,  bi  =  b. 

We  assume  A,B,k  independent  whence  the  joint  of  distribution  of  the  data  and 
parameters  is 

f(Y|  A,B,k)  •  r(k)  •  A(A)  •  7(B)  (6) 

where  at  Y  =  y  =  (yh-  •  -  ,yn), 


f(y|A,B,k)  -  n  ay  yttl  n  byt  ytti  •  /*o(yi), 

t  =  1  >  t  =  K  > 


2  <  k  <  n-  1 


u-i 

=  fn.ayt  yt+i  ' 

t = 1  i 


k  =  n 


n  -1 

=  ,!!,byt  ytM  ‘  W)(yi)> 


k  =  1 


with  /10  denoting  the  initial  or  starting  state  distribution.  Using  (6)  and  (7)  we  see 
that  the  full  posteriors  are  as  follows: 


A|Y,B,k~nDXi+Zi(a,) 


B|Y,A,ic~  niD7i+z,(1ii) 


(9) 
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where  Zi  =  (Zn, *  •  -,ZiP)  with  Zjj  =  #  of  transitions  from  i  to  j  within  observations 
Z-  =  (Z-lt‘ • -jZ-p)  with  Zy  =  #  of  transitions  from  i  to  j  within 
observations  Yk,-  •  *,Yn,  and 

p(k|Y,A,B)  =  f(Yl^’B<k)r(k)  (10) 

l  f(Y|A,B,k)r(k) 

k  s  1 

Note  that  since  we  are  conditioning  on  all  the  data,  hence  Yi,  hq  does  not  appear  in 
(8)  or  (9)  and  cancels  in  (10).  Thus  we  need  not  worry  about  its  specification. 

A  three  stage  hierarchical  model  can  be  developed  by  placing  priors  on  the  A* 
and  7i.  Details  are  similar  to  the  previous  examples  and  are  omitted.  We  note  that 
work  on  hierarchical  models  for  contingency  tables  as  in  e.g.,  Albert  and  Gupta 
(1982)  is  closely  related. 

We  add  the  usual  Bayesian  caveat.  The  number  of  parameters  involved  in 
this  analysis  is  2r(r-l)  +  1.  Hence  the  smaller  n  is  relative  to  2r(r-l)  +  1  the  more 
the  prior  will  drive  the  posterior  densities.  Other  considerations  may  make  it 
important  to  detect  the  change  point  as  soon  as  possible  rather  than  waiting  to  the 
end  of  the  data  sequence.  Simple  updating  of  the  posterior  given  new  data  is  not 
possible  since  with  additional  data  the  domain  of  k  changes.  If  "change"  versus  "no 
change"  is  of  primary  concern,  as  an  informal  sequential  procedure  we  might  place  a 
spike  in  the  prior  r  at  k  =  n.  For  instance  r(n)  =  1/2  implies  prior  indifference 
regarding  a  change  in  the  first  n  observations.  Use  of  a  uniform  prior  would  imply 
odds  of  n— 1  to  1  for  a  change.  Monitoring  the  posterior  odds  for  no  change, 
P(k=n|  Y)/(l-P(K=nj  Y)),  will  reveal  a  downward  trend  to  warn  of  change. 

As  an  illustrative  example  we  consider  a  three  state  stationary  Markov  chain 


where 
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'0.7000 

0.1500 

0.15001 

0.3333 

0.3333 

0.3333' 

A  = 

0.3333 

0.3333 

0.3333 

,  B  = 

0.1500 

0.7000 

0.1500 

0.3333 

0.3333 

0.3333J 

0.3333 

0.3333 

0.3333 

Table  3  gives  a  sequence  of  50  observations  generated  with  a  change  after  k  =  35. 

(Insert  Table  3  here) 

r(k)  was  taken  to  be  uniform  over  (1,  -  •  -50}.  The  Dirichlet  priors  are  taken 
to  be  generalized  uniforms,  i.e.  all  Aij,  7ij  =  1.  Turning  to  the  marginal  posteriors, 
in  Figure  6  we  show  for  k|Y  a  comparison  of  the  density  at  iteration  4  (solid  line) 
and  at  iteration  5(dotted  line)  using  m  =  100.  We  would  not  conclude  convergence. 
In  Figure  7  we  compare  iterations  30  (solid  line)  and  31  (dotted  line).  Now  a 
conclusion  of  convergence  seems  appropriate.  In  fact  the  density  estimates  have 
been  roughly  this  stable  since  iteration  20.  The  marginal  posterior  for  k  has  mode 
at  33  with  roughly  60%  of  the  mass  on  the  set  {33,  34,  35}.  In  Figure  8  we  show  the 
marginal  posteriors  for  an|Y  and  bn|Y.  In  Figure  9  we  show  the  marginal 
posteriors  for  a22|Y  and  b22|Y.  Note  that  for  both  an  and  b22  modes  and 
concentration  of  mass  agree  with  the  "true"  an  and  b22  respectively.  Interestingly, 
for  bn  the  marginal  posterior  is  one-tailed.  This  arises  because  for  the  observed 
sequence  there  are  no  transitions  from  state  1  to  state  1  after  k  =  33. 

7.  Changing  Linear  Regression  Models 

Bayesian  analysis  of  changing  linear  models  includes  the  work  of  Bacon  and 
Watts  (1971),  Ferreira  (1975),  Hoibert  and  Broemeling  (1977),  Chin  Choy  and 
Broemeling  (1980),  Smith  and  Cook  (1980)  and  Moen,  Salazar  and  Broemeling 
(1985).  Typically  this  work  involves  simplified  models  and/or  considerable  analytic 
effort.  By  contrast  the  Gibbs  sampler  enables  analysis  of  complex  models  while 
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demanding  little  mathematical  and  computational  expertise  from  the  user.  In  what 
follows,  we  outline  and  exemplify  the  basic  unconstrained  model.  Furthermore,  in 
the  spirit  of  robustness  we  investigate  the  impact  of  the  choice  of  the  prior  on  the 
results.  We  then  add  the  constraint  of  ordered  slope  parameters,  and  show  the 
surprising  ease  with  which  the  Gibbs  sampler  provides  a  solution,  illustrating  with  a 
second  example. 

7.1  Basic  Unconstrained  Model 

Consider  a  three-stage  hierarchical  simple  linear  regression  model,  where  at 
the  first  stage  Yj  ~  N(aj+/?iXj,  a2),  i  =  Yj  ~  N(a2+#>xj,  a\), 

i  =  k+l,---,n.  At  the  second  stage  we  take  6\  =  (ai,/?i)T,  02  =  (02, &)T 
independent  N(0q£),  and  al,  a2  independent  IG(ao,bo).  Again  we  assume  k  follows 
a  discrete  uniform  distribution  on  Kn. 


At  the  third  stage,  we  take  the  independent  normal-Wishart  form  0O  ~ 
N(p,C),  E"1  ~  W((pV)‘ l,p).  Thus  we  have  a  13  parameter  problem  with  known 
constants  /x,  C,  V,  p,  ao  and  bo- 

3  For  a  given  k  define  B-k)  =  (<7'i2X-k)TX<ik)  +  E'1)  \  b^  =  <r'l2X<ik,TY(ik)  + 

X  X 

1  !  _  1  n  TtrUs\ra  ^  —  1  *  *  *  *  1  _  1  *  *  *  *  1  \/^k)  _____ 

S  ,  1-1,2  where  Xx  -  [xr—XkJ  ’  X2  “  (Xk*r---XQJ  ’  Yi  " 
(Yt, *  •  -,Yn)T,  Y2k)=(Yk*i,-  •  -Yn)T.  Let  A  =  (2  E  1  +  C  !)  \  Then  using  standard 
distribution  theory  we  obtain  the  following  full  conditional  distributions: 
*rN(B<ik,b<ik),  Bjk))  i  =  1,2;  (72rIG(ao  +  k/2,  {^(Y^1  -  x;k)^)T  (Y[k)  -  x[k,ft) 
+  1/bo}'1);  (r2~IG(ao  +  *jk-,  {^(Yik)  -  X;k,^)T  (Y2k)  -  xjk,fc)  +  1/bo}'1); 

flo~N(A{E*I(0I+02)+C'l/i},  A);  £-‘~W({E(0i  -  0o)(0i  -  0O)T  +  pV}*\  p+2)  and 

Mk|Y,  0U  02,  a2,  o\,  0O,  E)  =  L(Y;  k,  0U  $2,  a\,  <t2)/E  L(Y;  k,  0U  0,,  a2,  a2) 

Kn 


where  L(Y;  k,  0.,  fc,  <7“,,  a\)=  exp{-.£  l~(Y,1t’-X'1k,0i)T  (Y,Ik,-xik,tfi)}/o?  <r"2'\ 

1*1  la * 


k  n-k 


« 
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Simulation  from  the  Wishart  distribution  is  easily  done  using  the  algorithm  of 
Odell  and  Feiveson  (1966)  as  outlined  for  the  2*2  case  in  Gelfand  et.  ai.  (1990). 

We  apply  the  Gibbs  sampler  thus  defined  to  the  data  in  Table  4,  which  come 
from  Bacon  and  Watts  (1971,  p.530).  In  this  data,  X  represents  the  log  of  the  flow 
rate  of  water  down  an  inclined  channel  (g./cm.sec.),  and  Y  represents  the  log  of  the 
height  of  the  stagnant  surface  layer  (cm.)  for  different  surfactants. 

(Insert  Table  4  here) 

The  data  seem  to  indicate  a  decreasing  linear  trend  that  appears  to  become 
more  steeply  decreasing  for  X  >  0.  We  apply  our  model  using  a  vague  prior  for  the 
o\,  ao  =  0.1  and  bo  =  100,  along  with  a  vague  third-stage  prior  for  0O,  all  entries  of 
H  and  C'1  equal  to  0.  For  E*1  we  compare  two  different  Wishart  priors,  the  first 
somewhat  informative  (p  =  4,  V  =  °'q01  q°  J ),  and  the  second  more  vague  (p  =  2, 


R  = 


0  0 
0  0 


)• 


Convergence  was  obtained  within  50  iterations  and  densities  are  drawn  at  the 
51st  using  m  =  100.  Figure  10  plots  the  estimated  marginal  posterior  for  k  using  the 
informative  (solid  line)  and  vague  (dashed  line)  priors  on  I’1.  Figure  11  does  the 
same  for  0X  and  /?2,  the  slopes  before  and  after  the  change  (the  (h  estimates  are  on 

n  n 

the  left).  Clearly  with  noninformative  priors  on  0O,  ax  and  a2,  changing  the  prior 
on  XT1  can  have  a  rather  marked  effect  on  the  results.  For  example,  k's  posterior 
distribution  is  much  more  diffuse  using  the  vague  Wishart  prior.  Our  intent  here  is 
not  to  claim  one  or  the  other  solution  is  "correct,"  but  rather  to  emphasize  how 
simple  it  is  using  the  Gibbs  sampler  for  the  data  analyst  to  investigate  prior 
robustness  interactively. 

The  estimated  posterior  modes  of  0i  for  the  informative  and  vague  priors  are 
-0.42  and  -0.44  respectively,  and  for  02  they  are  -1.01  and  -0.98,  respectively. 
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Though  Bacon  and  Watts  used  a  different  parameterization,  a  somewhat  less 
elaborate  model,  vague  prior  specification,  and  treated  k  as  continuous,  their  results 
are  quite  comparable  to  ours:  their  joint  posterior  obtains  its  global  maximum  at 
Pi  =  -0.44,  P2  =  -1.02,  with  the  change  estimated  to  have  occurred  at  X  =  0.054. 
This  corresponds  to  k  =  13,  which  is  the  posterior  median  we  obtain  under  either  of 
our  prior  specifications. 

7.2  Ordered  Slopes  Model 

In  certain  applications  we  know  that  if  there  has  been  a  change,  the  change 
must  be  in  a  certain  direction.  We  would  want  to  incorporate  this  prior  knowledge 
into  our  analysis.  This  entails  placing  order  constraint(s)  on  the  parameters,  which 
makes  required  integrations  much  more  difficult,  perhaps  impossible.  The  Gibbs 
sampler  can  again  be  used  to  overcome  this  difficulty. 

Consider  the  portion  of  the  prior  involving  ^  and  fa-  Suppose  we  draw  0\  by 
first  obtaining  P\  and  then  c*i  given  P\.  Assume  that  P\  and  #>  correspond  to  the 
larger  and  smaller  respectively  of  two  independent  draws  from  the  same  marginal 
normal  prior  on  the  slopes  as  in  the  preceding  section.  Then  the  full  posteriors  for 
Pi  and  Pi  are  merely  truncated  versions  of  the  previous  full  posteriors,  truncated  so 
that  P2  <  Pi-  The  full  conditional  distributions  for  the  aj  as  well  as  for  all  other 
parameters  are  unchanged.  One  for  one  sampling  from  a  truncated  distribution  can 
be  effected  using  a  suggestion  in  Devroye  (1986,  p.  38).  More  general  order 
constraint  can  be  modeled  by  assuming  two  independent  but  not  identically 
distributed  draws  before  ordering  to  obtain  Pi  and  p2 . 

(Insert  Table  5  about  here) 

To  illustrate  these  ideas,  consider  the  data  set  in  Table  5  generated  according 
to  the  model  Yi  ~  N(i,  52),  i  =  1,*  •  •  15,  Y-l  ~  N(30-i,  52),  i  =  16*  •  29  so  that  the 
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true  /?i  and  02  are  1  and  -1,  respectively.  The  remaining  prior  structure  agrees  with 
that  of  the  previous  section  except  for  Z 1  where  we  take  Wishart  with  p  =  3  and  V 
=  I.  We  again  obtained  convergence  within  50  iterations  and  our  plots  are  based  on 
m  =  100.  Figure  12  compares  the  resulting  estimated  marginal  posteriors  for  0\  and 
02  under  the  model  of  the  previous  section  (solid  line)  and  the  present  ordered  slopes 
model  (dashed  line).  The  pair  of  0\  curves  are  to  the  right  in  the  figure.  For  0U  the 
constrained  model  yields  a  more  concentrated  posterior.  In  particular,  both  of  the 
tails  are  quite  heavy  in  the  unconstrained  model  ranging  from  roughly  -8.0  to  +13.0 
while  for  the  constrained  model  they  are  limited  to  roughly  -1.2  to  5.7.  Additional 
studies  (not  shown)  show  more  and  more  dramatic  differences  between  the  two 
models  as  p  is  decreased;  the  tails  become  heavier  and  heavier  for  the  basic  model 
while  remaining  virtually  unchanged  in  the  constrained  model. 

8.  Conclusion 

We  have  shown  how  a  broad  range  of  hierarchical  Bayes  change  point  models 
can  be  straightforwardly  analyzed  using  an  iterative  sampling  based  approach 
known  as  the  Gibbs  sampler.  Interesting  future  applications  lie  in  such  as  areas  as 
time  series  and  other  dynamic  models,  generalized  linear  models  and  nonlinear 
models  as  well  as  in  the  employment  of  nonconjugate  priors. 
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Table  1.  Annual  volume  of  the  Nile  River  (discharge  at  Aswan,  1010m3) 
from  1871  to  1970,  with  apparent  changepoint  near  1898 


Year 

Vol. 

Year 

Vol. 

1871 

11.20 

1896 

12.20 

1872 

11.60 

1897 

10.30 

1873 

9.63 

1898 

11.00 

1874 

12.10 

1899 

7.74 

1875 

11.60 

1900 

8.40 

1876 

11.60 

1901 

8.74 

1877 

8.13 

1902 

6.94 

1878 

12.30 

1903 

9.40 

1879 

13.70 

1904 

8.33 

1880 

11.40 

1905 

7.01 

1881 

9.95 

1906 

9.16 

1882 

9.35 

1907 

6.92 

1883 

11.10 

1908 

10.20 

1884 

9.94 

1909 

10.50 

1885 

10.20 

1910 

9.69 

1886 

9.60 

1911 

8.31 

1887 

11.80 

1912 

7.26 

1888 

7.99 

1913 

4.56 

1889 

9.58 

1914 

8.24 

1890 

11.40 

1915 

7.02 

1891 

11.00 

1916 

11.20 

i892 

12.10 

1917 

11.00 

1893 

11.50 

1918 

8.32 

1894 

12.50 

1919 

7.64 

1895 

12.60 

1920 

8.21 

Year 

Vol. 

Year 

Vol. 

1921 

7.68 

1946 

10.40 

1922 

8.45 

1947 

8.60 

1923 

8.64 

1948 

8.74 

1924 

8.62 

1949 

8.48 

1925 

6.98 

1950 

8.90 

1926 

8.45 

1951 

7.44 

1927 

7.44 

1952 

7.49 

1928 

7.96 

1953 

8.38 

1929 

10.40 

1954 

10.50 

1930 

7.59 

1955 

9.18 

1931 

7.81 

1956 

9.86 

1932 

8.65 

1957 

7.97 

1933 

8.45 

1958 

9.23 

1934 

9.44 

1959 

9.75 

1935 

9.84 

1960 

8.15 

1936 

8.97 

1961 

10.20 

1937 

8.22 

1962 

9.06 

1938 

10.10 

1963 

9.01 

1939 

7.71 

1964 

11.70 

1940 

6.76 

1965 

9.12 

1941 

6.49 

1966 

7.46 

1942 

8.46 

1967 

9.19 

1943 

8.12 

1968 

7.18 

1944 

7.42 

1969 

7.14 

1945 

8.01 

1970 

7.40 
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Table  2:  British  Coalmining  disaster  data  by  year  1851—1962. 

Data  from  Maguire  et.  al.  (1952)  as  corrected  by  Jarrett  (1979). 


Year 

Count 

Year 

Count 

1851 

4 

1881 

2 

1852 

5 

1882 

5 

1853 

4 

1883 

2 

1854 

1 

1884 

2 

1855 

0 

1885 

3 

1856 

4 

1886 

4 

1857 

3 

1887 

2 

1858 

4 

1888 

1 

1859 

0 

1889 

3 

1860 

6 

1890 

2 

1861 

3 

1891 

2 

1862 

3 

1892 

1 

1863 

4 

1893 

1 

1864 

0 

1894 

1 

1865 

2 

1895 

1 

1866 

6 

1896 

3 

1867 

3 

1897 

0 

1868 

3 

1898 

0 

1869 

5 

1899 

1 

1870 

4 

1900 

0 

1871 

5 

1901 

1 

1872 

3 

1902 

1 

1873 

1 

1903 

0 

1874 

4 

1904 

0 

1875 

4 

1905 

3 

1876 

1 

1906 

1 

1877 

5 

1907 

0 

1878 

5 

1908 

3 

1879 

3 

1909 

2 

1880 

4 

1910 

2 

Year 

Count 

Year 

Count 

1911 

0 

1941 

4 

1912 

1 

1942 

2 

1913 

1 

1943 

0 

1914 

1 

1944 

0 

1915 

0 

1945 

0 

1916 

1 

1946 

1 

1917 

0 

1947 

4 

1918 

1 

1948 

0 

1919 

0 

1949 

0 

1920 

0 

1950 

0 

1921 

0 

1951 

1 

1922 

2 

1952 

0 

1923 

1 

1953 

0 

1924 

0 

1954 

0 

1925 

0 

1955 

0 

1926 

0 

1956 

0 

1927 

1 

1957 

1 

1928 

1 

1958 

0 

1929 

0 

1959 

0 

1930 

2 

1960 

1 

1931 

3 

1961 

0 

1932 

3 

1962 

1 

1933 

1 

1934 

1 

1935 

2 

1936 

1 

1937 

1 

1938 

1 

1939 

1 

1940 

2 

2S 


Table  3:  A  sequence  of  50  observations  for  the  3-state  Markov  chain  example 


1 


1 


1 


2 


2 


2 


Table  4.  Stagnant  band  height  data:  x  =  log  (flow  rate  in  g./cm.see.), 
y  =  log  (band  height  in  cm.),  from  Bacon  &  Watts  (1971) 


x 

y 

x 

y 

x 

y 

X 

y 

0.11 

0.44 

0.11 

0.43 

0.34 

0.25 

-1.08 

0.99 

-0.80 

0.90 

0.11 

0.43 

1.19 

-0.65 

-1.08 

1.03 

0.01 

0.51 

-0.63 

0.81 

0.59 

-0.01 

0.44 

0.13 

-0.25 

0.65 

-0.63 

0.83 

0.85 

-0.30 

0.34 

0.24 

-0.25 

0.67 

-1.39 

1.12 

0.85 

-0.33 

0.25 

0.30 

-0.12 

0.60 

-1.39 

1.12 

0.99 

-0.46 

-0.12 

0.59 

0.70 

-0.13 

0.99 

-0.43 

-0.94 

0.92 

0.70 

-0.14 

0.25 

0.33 

Table  5:  Generated  data  for  ordered  slopes  example 


Xi 

yi 

Xi 

yi 

Xi 

y> 

Xi 

yi 

1.00 

-6.71 

9.00 

12.06 

17.00 

10.70 

25.00 

-1.28 

2.00 

-8.42 

10.00 

9.90 

18.00 

9.07 

26.00 

3.70 

3.00 

3.85 

11.00 

15.03 

19.00 

12.69 

27.00 

-2.04 

4.00 

11.25 

12.00 

9.69 

20.00 

8.12 

28.00 

2.00 

5.00 

11.96 

13.00 

21.99 

21.00 

11.18 

29.00 

-8.35 

6.00 

-4.58 

14.00 

14.20 

22.00 

3.58 

7.00 

3.90 

15.00 

6.04 

23.00 

10.80 

8.00 

13.63 

16.00 

16.43 

24.00 

12.76 

.  77 


.04 


A|Y  (reduced  data) 


using  full  data  and  20%  missing  (reduced)  data 


V  R|Y  (reduced  data) 


using  full  data  and  20%  missing  (reduced)  data 


.84 


.  00 


informative  case 


informative 


unordered  =  _ ,  ordered 
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